# Full text of "MixEst: An Estimation Toolbox for Mixture Models"

## See other formats

arXiv: 1507.06065vl [stat.ML] 22Jul2015 Journal of Machine Learning Research 1 (2015) 1-5 Submitted 4/00; Published 10/00 MixEst: An Estimation Toolbox for Mixture Models Reshad Hosseini reshad.hosseini@ut.ac.ir School of ECE, College of Engineering University of Tehran Tehran, Iran Mohamadreza Mash’al mrmashal@ut.ac.ir School of ECE, College of Engineering University of Tehran Tehran, Iran Editor: Abstract Mixture models are powerful statistical models used in many applications ranging from density estimation to clustering and classification. When dealing with mixture models, there are many issues that the experimenter should be aware of and needs to solve. The MixEst toolbox is a powerful and user-friendly package for MATLAB that implements several state-of-the-art approaches to address these problems. Additionally, MixEst gives the possibility of using manifold optimization for fitting the density model, a feature specific to this toolbox. MixEst simplifies using and integration of mixture models in statistical models and applications. For developing mixture models of new densities, the user just needs to provide a few functions for that statistical distribution and the toolbox takes care of all the issues regarding mixture models. MixEst is available at visionlab.ut.ac.ir/mixest and is fully documented and is licensed under GPL. Keywords: mixture models, mixtures of experts, manifold optimization, expectation- maximization, stochastic optimization 1. Introduction Mixture models are an integrated and fundamental component in many machine learn- ing problems ranging from clustering to regression and classification ( McLachlan and Peel 2000). Estimating the parameters of mixture models is a challenging task due to the need to solve the following issues in mixture modeling: • Unboundedness of the likelihood: This problem occurs when one component gets a small number of data points and its likelihood becomes infinite ( Ciuperca et ah . 20031 ). • Local maxima: The log-likelihood objective function for esti mating the parame ters of mixture models is non-concave and has many local maxima ( Ueda et ah . 2000 ). • Correct number of comp onents: In many applica tions, it is needed to find the correct number of components ( Khalili and Chen . 12007 ). Addressing these issues for a mixture density when it is not available in common mixture modeling toolboxes will cost a lot of time and effort for the experimenter. MixEst addresses @2015 Reshad Hosseini Mohamadreza Mash’al. Hosseini and Mash’al all these issues not only for already implemented densities, but also for densities that the user may implement. By implementing densities, we mean implementing a few simple functions which will be briefly discussed in section [3j This toolbox provides a framework for applying manifold optimization for estimating the parameters of mixture models. This is an important feature of this toolbox, because recent empirical evidence shows that manifol d optim i zation can surpa ss expectation maximization in the case of mixtures of Gaussians OHosseini and Sral . 120151). It also opens the door for large-scale optimization by using stochastic optimization methods. Stochastic optimization also allows solving the likelihood unboundedness problem mentioned above, without the need of implementing a penalizing function for the parameters of the density. While several libraries are available for working with mixture models, to the best of our knowledge, none of them offers a modular and flexible framework that allows for fine-tuning the model structure or can provide universal algorithms for estimating model parameters solving all the problems listed above. A review of features available in some libraries can be seen in Section [U In the next section, we give a short overview of the toolbox and its features. 2. About the MixEst Toolbox This toolbox offers methods for constructing and estimating mixtures for joint density and conditional density modeling, therefore it is applicable to a wide variety of applications like clustering, regression and classification through probabilistic model-based approach. Each distribution in this toolbox is a structure containing a manifold structure represent¬ ing parameter space of the distribution along with several function handles implementing density-specific functions like log-likelihood, sampling, etc. Distribution structures are con¬ structed by calling factory functions with some appropriate input arguments defining the distribution. For example for constructing a mixture of one-dimensional Gaussians with 2 components, it will suffice to write the following commands in MATLAB: Dravn = mvnfactory(1); Dmix = mixturefactory(Dmvn, 2); As an example of how to evoke a function handle, consider generating 1000 samples from the previously defined mixture: theta.D{1}.mu = 0; theta.D{1}.sigma = 1; % mean and variance of the 1st component theta.D{2}.mu = 5; theta.D{2}.sigma = 2; % mean and variance of the 2nd component theta.p = [0.8 0.2]; % weighting coefficients of components data = Dmix.sample(theta, 1000); Each distribution structure exposes a common interface that optimization algorithms in the toolbox can use to estimate its parameters. In addition to the EM algorithm which is a commonly implemented method in available libraries, our toolbox also makes optimization on manifolds available featuring procedures like early-stopping and mini-batching to avoid overfitting. For optimization on manifol ds, our too lbox depe nds on optimization procedures of an excellent toolbox called Manopt ([Boumal et a l.. 120141 ). In addition to optimization algorithms of Manopt like steepest descent, conjugate gradient and trust regions methods, the user can also use our implementation of Riemmanian LBFGS method. 2 MixEst Toolbox for Mixture Models 3. Model Development MixEst includes many joint and conditional distributions to model data ranging from con¬ tinuous to discrete and also directional. Some users, however, may want to apply the tools developed in this toolbox for mixtures of a distribution not available in the toolbox yet. To this end, the user needs to write a factory function that constructs a structure for the new distribution. Each distribution structure has a field named “M” determining the manifold of its parameter space. For example for the case of multivariate Gaussian distribution, this is a product manifold of a positive definite manifold and a Euclidean manifold: % datadim is the function input argument determining the dimensionality of data muM = euclideanfactory(datadim); sigmaM = spdfactory(datadim); D.M = productmanifold(struct (' mu ' , muM, 'sigma', sigmaM)); The manifold of parameter space completely determines how parameter structure is given to or is returned by different functions. The structure of parameters for multivariate Gaussian would have two fields, a mean vector “mu” and a covariance matrix “sigma”. To use the estimation tools of the toolbox, two main functions have to be implemented. The weighted log-likelihood (wll) function and a function for computing the gradient of sum- wll with respect to the distribution parameters. The syntax for calling the wll function is: llvec = D.llvec(theta, data); The input argument theta is a structure containing the input parameters of the corre¬ sponding distribution. The second input argument data can be either a data matrix or a structure having several fields such as the data matrix and weights, which is interpreted using the mxe_readdata function. The output argument llvec is a vector with entries equal to wll for each datum (each column) in the data matrix. The function to compute the gradient of sum-wll has the following syntax: llgrad = D.llgrad(theta, data); The input arguments are similar to the function llvec. The output argument llgrad is a structure similar to the input argument theta returning the gradient of sum-wll with respect to each parameter. Some other (optional) functions that can be implemented for distributions are: init: This is for initializing the estimator using the data. estimatede fault: If the maximum wll has a structure that allows fast optimization (or has a closed-form solution), this estimator can be implemented in this function. When this function is not present, the Riemmanian optimization is called in the maximization step of EM algorithm. llgraddata: This function computes the gradient of wll with respect to the data. It is required in some special cases such as when the distribution is used as the radial component of an elliptically-contoured distribution or as the components in independent component analysis. 3 Hosseini and Mash’al 11: This function is sum-wll (sum of the output vector of llvec function). Sometimes it is faster to write this function differently than just calling llvec and summing up its output vector. Two other functions that can be used in the split-and-merge algorithms to avoid local maxima of mixture models are kl (for computing KL-divergence) and entropy (for com¬ puting entropy). If the user wants to evoke a maximum-a-posteriori estimate, the functions penalizerparam, penalizercost and penalizergrad need to be implemented. 4. Feature Comparison To demonstrate the richness of features in MixEst, we are comparing its features with several other well-known packages in Table [TJ Among many toolboxes available for mix¬ ture modeli ng, we select tho s e tha t are feat ure-rich and representativ e. These p ackages are S klearn ( Pedreeosa et all l201lll . Mclust ( Fralev and Rafterv . 199(1 ). FlexMix ( Leisch . 2004 1. Bayes Net ( Murphvl . l200ll l and MixMod ( Biernacki et al. . 20061 ). We include Bayes Net to demonstrate what a generic Bayesian graphical modeling toolbox can do. Sklearn is a powerful machine learning toolbox containing many tools, among others tools specific for mixture modeling. MixMod also provides bindings for Scilab and Matlab. Table 1: Feature comparison of our toolbox and some other well-known packages. Different rows correspond to the following specifications of different toolboxes: 1. Pro¬ gramming language; 2. Approaches for solving local minima problem (SM stands for split-and-merge approach, IDMM for infinite dirichlet mixture models, HC stands for initialization using hierarchical clustering); 3. Manifold optimization; 4. Bayesian approaches for inference (MAP stands for maximum-a-posteriori, VB stands for variational Bayes); 5. Large-scale optimization (SEM stands for stochastic EM, MB stands for mini-batching); 6. Having tools for model selec¬ tion; 7. Automatic model selection (CSM stands for competitive split-and-merge); 8. Ease of extensibility; 9. Having mixtures of experts; 10. Having mixtures of classifiers; 11. Having mixtures of regressors; MixEst SKlearn Mclust FlexMix Bayes Net MixMod # 1 Matlab Python R R Matlab C++ # 2 SM IDMM HC # 3 Yes No No No No No # 4 MAP VB MAP MAP SM # 5 MB SEM # 6 Yes No Yes No No Yes # 7 CSM IDMM # 8 Easy Easy Medium # 9 Yes No No No Yes No # 10 Yes No No No Yes No # n Yes No No Yes Yes No 4 MixEst Toolbox for Mixture Models References Christophe Biernacki, Gilles Celeux, Gerard Govaert, and Florent Langrognet. Model-based cluster and discriminant analysis with the rnixmod software. Computational Statistics and Data Analysis, 51(2):587-600, 2006. Nicolas Boumal, Bamdev Mishra, P.-A. Absil, and Rodolphe Sepulchre. Manopt, a rnatlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15:1455- 1459, 2014. Gabriela Ciuperca, Andrea Ridolfi, and Jerome Idier. Penalized maximum likelihood es¬ timator for normal mixtures. Scandinavian Journal of Statistics, 30(l):45-59, March 2003. Chris Fraley and Adrian E Raftery. Mclust: Software for model-based cluster analysis. Journal of Classification, 16(2):297-306, 1999. Reshad Hosseini and Suvrit Sra. Manifold optimization for Gaussian mixture models. arXiv preprint arXiv:1506.07677, 06 2015. Abbas Khalili and Jiahua Chen. Variable selection in finite mixture of regression models. Journal of the American Statistical Association, 102(479):1025-1038, September 2007. Friedrich Leisch. FlexMix: A general framework for finite mixture models and latent class regression in R. Journal of Statistical Software, 11 (8): 1—18, 2004. Geoffrey McLachlan and David Peel. Finite mixture models. John Wiley and Sons, New Jersey, 2000. Kevin P. Murphy. The Bayes Net toolbox for rnatlab. Computing Science and Statistics, 33:2001, 2001. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825-2830, 2011. Naonori Ueda, Ryohei Nakano, Zoubin Ghahramani, and Geoffrey E. Hinton. Split and merge EM algorithm for improving Gaussian mixture density estimates. The Journal of VLSI Signal Processing, 26(1):133-140, 2000. 5